Method to prefetch data from system memory using a bus interface unit

ABSTRACT

A method and system for prefetching data from system memory to a central processing unit (CPU). The system includes a DRAM(s) connected to a high speed bus, CPU and a bus interface unit that allows the CPU to communicate with the high speed bus. The bus interface unit contains logic circuitry, so that when the CPU generates a read memory access request for information associated with a first address, the interface unit generates a request packet for the information and prefetch information associated with a prefetch address. The bus interface unit creates the request packet by increasing the number of addresses originally requested by the CPU. The interface then sends the request packet to the system memory device, which retrieves and returns the requested data. The interface may include a pair of buffers which store both the information requested by the CPU and the speculative information. When the CPU generates a subsequent request, the interface compares the addresses requested with the data in the prefetch buffer. If the buffer contains the addresses, the data is sent to the processor. The prefetch buffer is directly addressable so that any line within the buffer can be retrieved.

[0001] This is a continuation-in-part of a co-pending United Statespatent application entitled “Method and Apparatus for Prefetching Datafrom System Memory to a Central Processing Unit” (Ser. No. 08/287,704)which is a continuation of a United States patent application entitled“Method and Apparatus for Prefetching Data from System Memory” (Ser. No.07/900,142), now abandoned.

BACKGROUND OF THE INVENTION

[0002] 1. Field of the Invention

[0003] The present invention relates to a method and system for readingdata from a memory device through a prefetching technique.

[0004] 2. Description of Related Art

[0005] It is commonly known that computer architectures include amicroprocessor that reads data from and writes data to system memorywhich usually includes dynamic random access memory (“DRAM”). DRAM isused in system memory because it provides an inexpensive means ofobtaining a large memory space. Typically, a computer system may have anumber of DRAM chips, each having a plurality of addressable memorylocations.

[0006] Many microprocessors read data from system memory in multiplebyte blocks. Accessing multiple bytes of data from memory is usuallyslower than the speed of the processor, causing the processor to waitfor the data. To reduce this access time, some computer architecturesincorporate various levels of cache, which provide smaller yet fasterblocks of addressable memory. When the processor generates a readrequest, the request is first sent to cache. If the processor determinesthat the cache does not contain the requested data (i.e., the cachemiss), the read request is sent to system memory. The data is retrievedfrom the system memory, and thereafter written to the processor andpossibly the cache for subsequent use.

[0007] To reduce the cache “miss” rates, some computer systems areincluding prefetch algorithms. When the processor reads data, the dataassociated with the successive addresses is also fetched and stored incache. For example, if the processor request addresses A0-A7, addressesA8-A15 will also be fetched from memory. The prefetch algorithmincreases the “hit” rate of the subsequent read request from theprocessor.

[0008] Such a prefetch method is disclosed in the publication by NormanJ. Jouppi, “IMPROVING DIRECT-MAPPED CACHE PERFORMANCE BY THE ADDITION OFA SMALL FULLY-ASSOCIATIVE CACHE AND PREFETCH BUFFERS”, The 17th AnnualInternational Symposium on Computer Architecture, May 28-31, 1990, pages364-373. The system disclosed by Jouppi teaches the use of a streambuffer between the first level (L1) and second level (L2) caches of theCPU. When there is a cache miss in the L1 cache, the data is fetchedfrom the L2 cache. When fetching from the L2 cache, the system alsofetches successive addresses and stores the additional data in thestream buffer. When the CPU generates a subsequent read, the request issupplied to both the L1 cache and the stream buffer. If the streambuffer contains the addresses requested, the data is sent to theprocessor.

[0009] The addition of the stream buffer therefore improves the hit ratewithout polluting the L1 cache. If neither the stream buffer or L1 cachehave the addresses, the data is fetched from the L2 cache along with aprefetch that replaces the data within the stream buffer. The streambuffer of the Jouppi system has a first in first out (“FIFO”) queue, sothat if the requested data is not in the top line of the buffer, thedata cannot be retrieved. The requested data is then fetched from thesecond level cache. The stream buffer will be flushed and restarted atthe missed address.

[0010] Although the Jouppi concept improves the internal performance ofmultilevel cache systems, it does not solve the inherent latencyproblems between the CPU and system memory. Prefetches have not beendesirable between a CPU and system memory because the extra time neededto read the additional data slows down the processor. The increased hitrate would not compensate for the delay in memory reads, therebyresulting in an inefficient system. It would therefore be desirable tohave a system that would provide an efficient way of prefetching datafrom system memory.

SUMMARY OF THE INVENTION

[0011] Adapted for a computer system including a control processing unit(“CPU”), system memory and a bus, a bus interface unit is coupledbetween the CPU and the bus to obtain information as well as prefetchinformation from the system memory. The bus interface unit receives afirst read request for information associated with a first address ofsystem memory. The bus interface unit and produces and places a requestpacket requesting the information and the prefetch informationassociated with the speculative addresses onto the bus to be read bysystem memory. Thereafter, the system memory provides the informationand the prefetch information to said bus interface unit along the bus.The information is transmitted to the CPU with the prefetch informationmay be transmitted to the CPU depending on the nature of a subsequentrequest by the CPU.

BRIEF DESCRIPTION OF THE DRAWINGS

[0012] The features and advantages of the present invention will becomemore readily apparent to those of ordinary skill in the art afterreviewing the following detailed description and accompanying drawings,wherein:

[0013]FIG. 1 is a block of a computer system including a bus interfaceunit supporting.

[0014]FIG. 2 is a circuit diagram of the bus interface unit of FIG. 1including a logic circuit and a prefetch circuit.

[0015]FIG. 3 is a circuit diagram of the logic circuit of the businterface unit of FIG. 2.

[0016]FIG. 4 is a circuit diagram of the prefetch circuit of the businterface unit of FIG. 2.

[0017]FIG. 5 is a schematic of an alternate embodiment of the businterface unit.

DETAILED DESCRIPTION OF THE INVENTION

[0018] An apparatus and method for efficiently reading data from systemmemory through prefetch techniques is described below. In the followingdescription, for purposes of explanation, specific details are set forthin order to provide a thorough understanding of the present invention.However, it is apparent to a person of ordinary skill in the art ofcircuit design that the present invention may be practiced without thesespecific details. In other instances, well known operations, functionsand devices are not shown in order to avoid obscuring the presentinvention. Moreover, a specific example has been created for the solepurpose of illustrating the present invention, but should not beconstrued as a limitation on the scope of the invention.

[0019] In the detailed description, a number of terms are frequentlyused to describe certain logic and define certain representationsherein. For example, a “select element” is defined as one or moremultiplexers configured in parallel or cascaded in series to produce adesired output. A “byte” is generally defined as a plurality ofinformation bits (i.e., binary values of address, data or control)transferred in parallel. A “request packet” is defined as a sequence ofsix one-byte information packets containing address, control and requestlength information which are transferred in series according to a formatdefined in “Rambus™ Product Catalog” (order no. 903010042081) publishedby Rambus, Inc. of Mountain View, Calif.

[0020] Referring to the drawings more particularly by reference numbers,FIG. 1 shows a system 100 employing the present invention. The system100 comprises a central processing unit (“CPU”) 110, a bus interfaceunit 120 and system memory 125 including at least one Dynamic RandomAccess Memory (“DRAM”) device 130. The CPU 110 is coupled to the businterface unit 120 through a pair of unidirectional buses 135 and 140.The bus interface unit 120 is coupled to the system memory 125 through abidirectional bus 150 thereby enabling the CPU 110 to communicate withthe system memory 125. The bus 150 is configured to support the Rambusprotocol.

[0021] The CPU 110 is capable of generating read and write memory accessrequests to the system memory 125. The information transferred includesdata and/or instructions, both of which will be generically referred toas “information” unless otherwise distinguished. In general, the CPU 110generates a read memory access request in sixteen (16) byte burstscorresponding to the byte length of a cache line. However, it iscontemplated that bursts can be appropriately altered to correspond withcache lines of 32 or 64 bytes in length. The read memory access requestsincludes addresses that are to be read from system memory 125.

[0022] System memory 125 preferably comprises at least one DRAM devicemanufactured, for example, by Rambus, Inc. of Mountain View, Calif. EachDRAM preferably includes two blocks of main memory 160 and 165, eachblock including a 36×256×256 array of memory cells wherein four (4) bitsof the 36 bit-wide block are used for parity. Each block 160 and 165operates in conjunction with its dedicated cache 170 and 175 having acache line of sixteen (16) bytes and storing approximately one kilobyte(“Kbyte”) of information. Preferably, the bus 150 is one byte wide suchthat information is serially transferred one byte at a time. The datatransfer rate on the bus 150 is preferably on the order of 500Mbytes/sec which translates into a clock “CLK” rate of 2 nanoseconds.

[0023] Referring now to FIG. 2, an illustrative embodiment of the businterface unit 120 is shown. The bus interface unit 120 comprises atransmitting sub-unit 200 that converts a read or write memory accessrequest from the CPU into a request packet being a sequence of one-byteinformation packets formatted according to the Rambus protocol. The businterface unit 120 further includes a receiving sub-unit 255 thatreconverts bytes of information from the system memory 125 into a formatconfigured for the CPU. The transmitting sub-unit 200 includes a logiccircuit 205, an incrementing address circuit 215, a prefetch addresslatch 220, a prefetch circuit 225, a first select element 235 and acomparator 240.

[0024] The CPU is coupled to the logic circuit 205 through theunidirectional bus 135 which includes address lines 136, a read/writeline 137, length request lines 138 and byte enable lines 139. Theaddress lines 136 are illustrated to be 32-bits wide to support a fourgigabyte address space while the read/write line 137, length requestlines 138 and byte enable lines 139 are represented as having bit widthsof one, six and eight bits, respectively. It is contemplated that suchbit widths are chosen for illustrated purposes and may be variedaccordingly.

[0025] The address lines 136 is used for transferring an address ofinformation requested by the CPU (“request information”) to be read fromor written to system memory. For clarity sake, this address is referredto as the “first address”. The address lines 136 are further coupled toboth the incrementing address circuit 215 and the comparator 240. Theaddress lines 136 are coupled to the incrementing address element 215 inorder to produce a speculative address by incrementing the first addresson the address lines 136. Thereafter, the speculative address istransferred via address lines 241 to the prefetch address latch 220 andtemporarily stored therein upon assertion of an enable signal via enableline 245 b. The speculative address is used in determining whetherinformation associated with a subsequent CPU request has already been“prefetched” and stored in the prefetch address latch 220 as discussedbelow.

[0026] In addition, the read/write line 137 is used to signal whetherthe CPU desires to read information from or write information intosystem memory. The length request lines 138 are used to indicate theamount of information requested by CPU (preferably at least one byte),while the byte enable lines 139 are used to indicate the number of bytesto be written to a selected address location in system memory.

[0027] Referring now to FIG. 3, the logic circuit 205 operates as aparallel-to-byte serial converter which receives information bits of theread or write memory access request from the CPU via lines 136-139 andserially transfers the sequence of one-byte information packets into theprefetch circuit 225 through lines 245 a. Collectively, theseinformation packets include, but are not limited to, the addressrequested by the CPU, a number of bytes requested (length or byteenable) and control information indicating the type of transaction (reador write).

[0028] The serial transmission of the information packets is controlledby appropriately routing the information bits into a select element 206and selectively outputting a byte at a time under direction of awell-known control circuit 207 operating in accordance with the Rambusprotocol. The control circuit 207 generally allows the serialsingle-byte transmission of the information packets except, if thecontrol circuit detects, during a read memory access request, that thefirst address is identical to the speculative address. Such detection isaccomplished by monitoring whether an output line 242 from thecomparator is asserted (logic “1”) or not. If the output line isasserted the select element 206 is disabled from transmitting theinformation packets to the prefetch circuit 225 but rather usesinformation previously stored in a prefetch buffer 265 of the receivingsub-unit 255. If the output line 242 is not asserted, indicating nomatch, the logic circuit asserts the enable line 245 b to allow thespeculative address to be stored in the prefetch address latch 220.

[0029] In the event that the first address is not equal to thespeculative address, the prefetch circuit 225 receives informationpackets. As shown in FIG. 4, the prefetch circuit 225 comprises a selectelement 226, an adder 227, an offset storage element 228, a stagingregister 229 and control logic circuit 230. The select element 226receives the sequence of information packets from the logic circuit 205and one information packet from the adder 227 through signal lines 231.This information packet from the adder 227 is the sum of the (i) lengthrequest information provided by lines 243, which are coupled to thelength request lines 138, and (ii) an offset from the offset storageregister 228. The offset is a binary representation equal to the numberof bytes of “prefetch information” requested in addition to the requestinformation. The prefetch information is typically equal to a cache linein size (sixteen bytes for this illustrative embodiment). Thus, thesystem memory provides more information than requested by the CPU.

[0030] During a cache line read request, the prefetch circuit 225monitors the sequence of information packets for the length requestinformation and upon detecting the length request information, thecontrol logic circuit 230 selects the output of the adder 227 toincrease the number of bytes of information retrieved from system memorybefore the addresses are sent to system memory. The information packetspropagate in series through the select element 226 and into the stagingregister 229 clocked at CLK. The staging register 229 is merely used fortiming purposes for transmission to the first select element 235 vialines 247.

[0031] For example, if the CPU issues a read request for a cache line ofsixteen bytes addressed by A0-A15, the prefetch circuit 225 will alterthe length request to reflect two cache lines addressable by A0-A31. Thebus interface unit 120 would then send the read request to system memoryrequesting information associated with the addresses A0-A31 which wouldbe subsequently transferred from system memory to the bus interface unit120. The prefetch circuit 225 may also contain logic to ensure that thespeculative addresses do not extend into a different DRAM. Dependingupon the characteristics of the CPU, the prefetch circuit 225 maygenerate an additional request instead of changing the addressesrequested.

[0032] Referring back to FIG. 2, the first select element 235 receivesas a first input information packets from the prefetch circuit 225 anddata directly from the CPU via data lines 248 as a second input. Thefirst select element 235 is controlled by the logic circuit 205 viaselect line 245 c. For a read memory access request, the logic circuit205 only selects the information packets from the prefetch circuit 225to be propagated from the first input of the first select element 235,along output lines 249 and onto bus interface pads 250 for latertransmission through the bus 150. However, for a write memory accessrequest, the logic circuit 205 first selects the first select element235 to propagate the information packets to the bus interface pads 250and, after completing the write memory access request, the logic circuit205 selects the first select element 235 to propagate write data fromdata lines 248 to the bus interface pads 250.

[0033] Referring back to FIG. 1, when the CPU generates a read memoryaccess request, the bus interface unit 120 takes the addresses requestedand generates and transmits the request packet onto the bus 150. EachDRAM of the system memory 125 monitors the bus 150. The addresses arecompared with the addresses stored in one of the caches 170 and 175. Ifthe cache contains the requested information addressed at the firstaddress, the DRAM 130 provides an asserted acknowledge signal to the businterface unit 120 and transmits the information onto the bus 150, byteby byte. If the requested information is not within cache, the DRAM 130transmits a negative acknowledge signal to the bus interface unit 120and performs an internal cache fetch. The internal cache fetch transfersthe requested information from main memory 160 or 165 into its cache 170or 175, respectively. The bus interface unit 120 then resubmits a readmemory access request into the bus 150. The DRAM 130 now has therequested information in cache, which is then transmitted to the businterface unit 120. Because most CPU's cannot retrieve information byteby byte every 2 nanoseconds, the bus interface unit 120 has a CPU buffer285 that stores the data from the bus for subsequent retrieval by theCPU. The CPU buffer 285 converts bytes from a second select element 275into 32-bit parallel data for the CPU.

[0034] Referring again to FIG. 2, the receiving sub-unit 255 of the businterface unit 120 comprises a de-select element 260, a prefetch buffer265, an address select circuit 270, the second select element 275, a tagelement 280, the CPU buffer 285 and control logic 290. The de-selectelement 260 is controlled by the address select circuit 270 to transferthe information from the bus 150 to either the prefetch buffer 265 orthe second select element 275. The address length select circuit 270stores the number of bytes requested by the CPU through the lengthrequest line 138 and counts each byte of information received fromsystem memory through the bus 150. Thus, continuing the above-describedexample, the request information (information from A0-A15) would berouted to the second select element 275 via signal lines 261 while theprefetch information (information from A16-A31) would be alternativelyrouted for storage in the prefetch buffer 265 via signal lines 262.

[0035] In order to increase the operational speed of the system, the businterface unit 120 is configured to include the comparator 240 whichchecks whether the CPU is issuing a read memory access request forinformation that has been already been prefetched by a prior read memoryaccess request. This is done by comparing the address of the currentread memory access request to the speculative address stored in theprefetch address latch 220 and provided to the comparator 240. If thetag element 280 is set, indicating that the prefetch buffer 265 isstoring valid prefetch information, the control logic 290 selects thesecond select element 275 so that the prefetch information istransferred from the prefetch buffer 265 to the CPU buffer 285 throughsignal lines 266 and 276. Moreover, the logic circuit 205 is disabledsince no information needs to be retrieved from system memory. However,if the addresses are not equal, the process continues as describedabove.

[0036] More specifically, using the specific example described above forillustrative purposes, for a read memory access request, the requestinformation associated with A0-A15 is input into the second selectelement 275 via lines 261. Since the tag element 280 is initiallycleared, the output from the control logic 290 selects the second selectelement 275 to transmit the request information to the CPU buffer 285for conversion to parallel data. Thereafter, the prefetch informationassociated with A16-A31 is stored in the prefetch buffer 265 causing thetag element 280 to be set.

[0037] Upon the CPU issuing another request, for example a read memoryaccess request, the comparator circuit 240 compares the address producedby the read memory access request with the speculative address stored inthe prefetch address latch 220. If these addresses are identical, thecomparator 240 asserts the signal line 242 which disables the logiccircuit 205 to prevent it from transferring information to the prefetchcircuit 225 and propagates a logic “1” to a first input of the controllogic 290. Since the tag element 280 is set from the prior read memoryaccess request, the control logic 290 asserts its select lines 291 toallow the prefetch information from the prefetch buffer 265 to betransmitted to the CPU buffer 285. If there is a write request to anaddress which has previously been prefetched and is stored in theprefetch buffer 265, the tag element 280 is cleared and the informationis overwritten or cleared.

[0038] Referring now to FIG. 5, a second illustrative embodiment of thebus interface unit 120 may include a plurality of prefetch buffers 300and 310 in which one of these prefetch buffers (i.e., the instructionprefetch buffer 300) is configured to store instructions while the otherprefetch buffer (i.e., the data prefetch buffer 310) is used to storedata. It is contemplated, however, that multiple data or instructionprefetch buffers may be employed simultaneously by altering the businterface unit 120 in a manner similar to that described below.

[0039] The isolation of the instruction prefetch buffer 300 from thedata prefetch buffer 310 allows one type of information to be retrievedby the CPU without purging the prefetch buffer for the other type. Thisincreases the “hit” rate within the prefetch buffers 300 and 310.Computer programs will typically run with consecutive lines ofinstruction or data. The successive lines can be interrupted with arequest for data or instruction. Such an interruption can degrade theperformance of the speculative prefetch. For example, in a system withone prefetch buffer (as shown in FIG. 2), the CPU may first request aninstruction, wherein prefetched instruction is stored in the prefetchbuffer. The CPU may then request data, which is not in the prefetchbuffer and must be retrieved from memory. The bus interface unit 120would prefetch the data and overwrite the instructions with the data. Ifthe CPU subsequently requests instructions, the CPU request must beretrieved from memory because the prefetch buffer now contains data.With the dual buffer system as shown, the original speculativeinstructions will still exist in buffer 300, when the CPU generates thesubsequent instruction request.

[0040] In order to configure the bus interface unit 120 to support theplurality of buffers 300 and 310, additional logic circuitry must beduplicated to operate in parallel. As shown, for two prefetch buffers300 and 310, the transmitting sub-unit 200 is altered to include twoprefetch address latches 220 a and 220 b and two comparators 240 a and240 b operating in parallel. The prefetch address latches 220 a and 220b are enabled by a logical AND'ing an enable signal from the logiccircuit 205, asserted as discussed in reference to FIG. 2 and a CPUINSTRUCTION/DATA control signal from the CPU via control line 315 toindicate whether the CPU request is for instructions or data.

[0041] In addition, the receiving sub-unit 255 is altered by includingthe two prefetch buffers 300 and 310 with corresponding tag elements 320and 330, respectively. Moreover, the de-select element 260 includesthose output lines 263-265 being inputs for the prefetch instructionbuffer 300, the prefetch data buffer 310 and the second select element275, respectively. Moreover, the deselect element 260 is requiredinitially to transmit request information into the second select element275 and transmit the prefetch instruction or prefetch data to theinstruction prefetch buffer or data prefetch buffer 310, respectively.

[0042] While certain exemplary embodiments have been described in detailand shown in the accompanying drawings, it is to be understood that suchembodiments are merely illustrative of and not restrictive on thepresent invention and that the invention not be limited to the specificarrangements and constructions shown and described, since various othermodifications may occur to those ordinarily skilled in the art.

What is claimed is:
 1. A computer system comprising: a bus; a memorydevice coupled to said bus; a central processing unit generating a firstmemory access request for information associated with a first address ofsaid memory device; a bus interface unit, coupled between said centralprocessing unit and said bus, including a transmitting sub-unit and areceiving sub-unit, wherein said transmitting sub-unit receives saidfirst memory access request for said information, generates a requestpacket for said information and prefetch information and places saidrequest packet onto said bus, and said receiving sub-unit receives saidinformation stored at said first address and said prefetch informationassociated with a speculative address and transfers at least saidinformation to said central processing unit.
 2. The computer systemaccording to claim 1, wherein said transmitting sub-unit includes alogic circuit for receiving said first memory access request from thecentral processing unit and for formatting said first memory accessrequest into said request packet; the prefetch circuit, coupled to thelogic circuit, for reconfiguring at least one information packet of saidrequest packet so that said request packet requests said information andsaid prefetch information; and an increment address circuit forreceiving said first address and producing said speculative address. 3.The computer system according to claim 2, wherein said first memoryaccess request includes at least said first address, a read/writeparameter and a length request parameter.
 4. The computer systemaccording to claim 3, wherein said prefetch circuit includes a storageelement for storing a predetermined offset and an adder circuit foradding the predetermined offset to the length request parameter toproduce a modified length request parameter.
 5. The computer systemaccording to claim 4, wherein said request packet includes said firstaddress, said read/write parameter and said modified length requestparameter.
 6. The computer system according to claim 1, wherein saidreceiving sub-unit includes an output buffer for temporarily storing atleast said information before outputting said information to saidcentral processing unit; at least one input buffer for storing saidprefetch information; a select element, coupled to said output bufferand said at least one input buffer, for outputting one of saidinformation and said prefetch information into said output buffer; acontrol logic circuit, coupled to said select element, for controllingsaid select element; a de-select element, coupled to said select elementand said at least one input buffer, for receiving in series saidinformation and said prefetch information from said memory device andoutputting said information to said select element and said prefetchinformation to one of said at least one buffer; and an address selectcircuit, coupled to said de-select element, for controlling saidde-select element to output said information into said select elementand to output said prefetch information into said at least one inputbuffer.
 7. The computer system according to claim 6, wherein saidcontrol logic circuit selects said select element to output saidprefetch information if a second memory access request requestsinformation associated with said speculative address.
 8. A computersystem comprising: memory means for storing information; processor meansfor generating a first memory access request for information associatedwith a first address of said memory means; bus means for transferring atleast said information stored at said first address from said memorymeans and said processor means; and bus interface means, coupled betweensaid processor means and said bus means, for retrieving said informationand prefetch information from said memory means, said bus interfacemeans including: transmitting means for receiving said first memoryaccess request for said information, for generating a second memoryaccess request for said information and said prefetch information andfor placing said second memory access request onto said bus means, andreceiving means for receiving said information and said prefetchinformation associated with a speculative address and for transferringat least said information to said processor means.
 9. The computersystem according to claim 8, wherein said transmitting means includeslogic circuit means for receiving said first memory access request fromthe processor means and for formatting said first memory access requestinto a plurality of information packets; prefetch circuit means, coupledto the logic circuit means, for reconfiguring at least one of theplurality of information packets so that said plurality of informationpackets request said information and said prefetch information; andaddressing means, coupled to said control processing unit, for receivingsaid first address and producing said speculative address.
 10. Thecomputer system according to claim 9, wherein said first memory accessrequest includes at least said first address, a device identificationparameter, a read/write parameter and a length request parameter. 11.The computer system according to claim 10, wherein said prefetch circuitmeans includes a storage element for storing a predetermined offset andan adder circuit for adding the predetermined offset to the lengthrequest parameter to produce a modified length request parameter. 12.The computer system according to claim 11, wherein said second memoryaccess request includes said first address, said device identificationparameter, said read/write parameter and said modified length requestparameter.
 13. The computer system according to claim 8, wherein saidreceiving means includes output buffer means for temporarily storing atleast said information before outputting said information to saidprocessor means; input buffer means for storing said prefetchinformation; select means, coupled to said output buffer means and saidinput buffer means, for outputting one of said information and saidprefetch information into said output buffer means; control logic means,coupled to said select means, for controlling said select means tooutput one of said information and said prefetch information to saidoutput buffer means; and de-select means, coupled to said select meansand said input buffer means, for receiving in series said informationand said prefetch information from said memory means and outputting saidinformation to said select means and said prefetch information to saidinput buffer means; and address select means, coupled to said de-selectmeans, for controlling said de-select means to output said informationinto said select means and to output said prefetch information into saidinput buffer means.
 14. A bus interface unit, coupled between a centralprocessing unit and a bus, comprising: a transmitting sub-unit forreceiving a first memory access request for information associated witha first address, generating a request packet for said information andprefetch information associated with a speculative address and placingsaid request packet onto said bus; and a receiving sub-unit forreceiving said information and said prefetch information andtransferring at least said information to said central processing unit.15. The bus interface unit according to claim 14, wherein saidtransmitting sub-unit includes a logic circuit for receiving said firstmemory access request from the central processing unit and forformatting said first memory access request into said request packet;the prefetch circuit, coupled to the logic circuit, for reconfiguring atleast one information packet of said request packet so that said requestpacket requests said information and said prefetch information; and anincrement address circuit for receiving said first address and producingsaid speculative address.
 16. The bus interface unit according to claim15, wherein said first memory access request includes at least saidfirst address, a read/write parameter and a length request parameter.17. The bus interface unit according to claim 16, wherein said prefetchcircuit includes a storage element for storing a predetermined offsetand an adder circuit for adding the predetermined offset to the lengthrequest parameter to produce a modified length request parameter. 18.The bus interface unit according to claim 7, wherein said request packetincludes said first address, said read/write parameter and said modifiedlength request parameter.
 19. The bus interface unit according to claim14, wherein said receiving sub-unit includes an output buffer fortemporarily storing at least said information before outputting saidinformation to said central processing unit; at least one input bufferfor storing said prefetch information; a select element, coupled to saidoutput buffer and said at least one input buffer, for outputting one ofsaid information and said prefetch information into said output buffer;a control logic circuit, coupled to said select element, for controllingsaid select element to output one of said information and said prefetchinformation to said output buffer; a de-select element for receiving inseries said information and said prefetch information transmitted alongsaid bus and outputting said information to said select element and saidprefetch information to said at least one input buffer; and an addressselect circuit, coupled to said de-select element, for controlling saidde-select element to output said information into said select elementand to output said prefetch information into said at least one inputbuffer.
 20. The bus interface circuit according to claim 19, whereinsaid control logic circuit selects said select element to output saidprefetch information if a second memory access request, immediatelysubsequent to said first memory access request, requests informationassociated with said speculative address.
 21. A bus interface circuit,coupled between a processor and a bus, for retrieving said informationand prefetch information from a memory device, said bus interfacecircuit including: transmitting means for receiving a first memoryaccess request for information associated with a first address, forgenerating a second memory access request for said information andprefetch information associated with a second address and for placingsaid second memory access request onto said bus; and receiving means forreceiving said information and said prefetch information and fortransferring at least said information to said processor.
 22. The businterface unit according to claim 19, wherein said transmitting meansincludes logic circuit means for receiving said first memory accessrequest from the processor and for formatting said first memory accessrequest into a plurality of information packets; and prefetch circuitmeans, coupled to the logic circuit means, for reconfiguring at leastone of the plurality of information packets so that said plurality ofinformation packets request said information and said prefetchinformation.
 23. The bus interface unit according to claim 22, whereinsaid first memory access request includes at least said first address, aread/write parameter and a length request parameter.
 24. The businterface unit according to claim 22, wherein said prefetch circuitmeans includes a storage element for storing a predetermined offset andan adder circuit means for adding the predetermined offset to the lengthrequest parameter to produce a modified length request parameter. 25.The bus interface unit according to claim 24, wherein said second memoryaccess request includes said first address, said read/write parameterand said modified length request parameter.
 26. The bus interface unitaccording to claim 19, wherein said receiving means includes outputbuffer means for temporarily storing at least said information beforeoutputting said information to said processor; input buffer means forstoring said prefetch information; select means, coupled to said outputbuffer means and said input buffer means, for outputting one of saidinformation and said prefetch information into said output buffer means;control logic means, coupled to said select means, for controlling saidselect means; de-select means, coupled to said select means and saidinput buffer means, for receiving in series said information and saidprefetch information and for outputting said information to said selectmeans and said prefetch information to said input buffer means; andaddress select means, coupled to said de-select means, for controllingsaid de-select means.
 27. A method for prefetching informationcomprising the steps of: a) generating by a processor a first readrequest for information associated with a first address; b) generating arequest packet including a plurality of information packets containingsaid first address and a speculative address; c) transmitting saidrequest packet to a memory device; d) retrieving said informationassociated with said first address and prefetch information associatedwith said speculative address; e) storing said information associatedwith said first address in an output buffer; f) storing said informationassociated with said speculative address in at least one input buffer;and, g) transmitting said information from said output buffer to saidcentral processing unit; and h) transmitting said prefetch informationinto said output buffer if a subsequent read request requestsinformation associated with said speculative address.